9 research outputs found

    Firmament: Fast, Centralized Cluster Scheduling at Scale

    Get PDF
    Centralized datacenter schedulers can make high-quality placement decisions when scheduling tasks in a cluster. Today, however, high-quality placements come at the cost of high latency at scale, which degrades response time for interactive tasks and reduces cluster utilization. This paper describes Firmament, a centralized scheduler that scales to over ten thousand machines at sub- second placement latency even though it continuously reschedules all tasks via a min-cost max-flow (MCMF) optimization. Firmament achieves low latency by using multiple MCMF algorithms, by solving the problem incrementally, and via problem-specific optimizations. Experiments with a Google workload trace from a 12,500-machine cluster show that Firmament improves placement latency by 20 x over Quincy [22], a prior centralized scheduler using the same MCMF optimiza- tion. Moreover, even though Firmament is centralized, it matches the placement latency of distributed schedulers for workloads of short tasks. Finally, Firmament exceeds the placement quality of four widely-used central- ized and distributed schedulers on a real-world cluster, and hence improves batch task response time by 6 x.This work was supported by a Google European Doc- toral Fellowship, by NSF award CNS-1413920, and by the Defense Advanced Research Projects Agency (DARPA) and Air Force Research Laboratory (AFRL), under contract FA8750-11-C-0249

    Learning Scheduling Algorithms for Data Processing Clusters

    Full text link
    Efficiently scheduling data processing jobs on distributed compute clusters requires complex algorithms. Current systems, however, use simple generalized heuristics and ignore workload characteristics, since developing and tuning a scheduling policy for each workload is infeasible. In this paper, we show that modern machine learning techniques can generate highly-efficient policies automatically. Decima uses reinforcement learning (RL) and neural networks to learn workload-specific scheduling algorithms without any human instruction beyond a high-level objective such as minimizing average job completion time. Off-the-shelf RL techniques, however, cannot handle the complexity and scale of the scheduling problem. To build Decima, we had to develop new representations for jobs' dependency graphs, design scalable RL models, and invent RL training methods for dealing with continuous stochastic job arrivals. Our prototype integration with Spark on a 25-node cluster shows that Decima improves the average job completion time over hand-tuned scheduling heuristics by at least 21%, achieving up to 2x improvement during periods of high cluster load

    Energy-Efficient Speculative Execution using Advanced Reservation for Heterogeneous Clusters

    Get PDF
    International audienceMany Big Data processing applications nowadays run on large-scale multi-tenant clusters. Due to hardware heterogeneity and resource contentions, straggler problem has become the norm rather than the exception in such clusters. To handle the straggler problem, speculative execution has emerged as one of the most widely used straggler mitigation techniques. Although a number of speculative execution mechanisms have been proposed, as we have observed from real-world traces, the questions of ``when'' and ``where'' to launch speculative copies have not been fully discussed and hence cause inefficiencies on the performance and energy of Big Data applications. In this paper, we propose a performance model and an energy consumption model to reveal the performance and energy variations with different speculative execution solutions. We further propose a window-based dynamic resource reservation and a heterogeneity-aware copy allocation technique to answer the ``when'' and ``where'' questions for speculative executions. Evaluations using real-world traces show that our proposed technique can improve the performance of Big Data applications by up to 30% and reduce the overall energy consumption by up to 34%

    Energy efficiency of large scale graph processing platforms

    No full text
    A number of graph processing platforms have emerged recently as a result of the growing demand on graph data analytics with complex and large-scale graph structured datasets. These platforms have been tailored for iterative graph computations and can offer an order of magnitude performance gain over generic data-flow frameworks like Apache Hadoop and Spark. Nevertheless, the increasing availability of such platforms and their functionality overlap necessitates a comparative study on various aspects of the platforms, including applications, performance and energy efficiency. In this work, we focus on the energy efficiency aspect of some large scale graph processing platforms. Specifically, we select two representatives, e.g., Apache Giraph and Spark GraphX, for the comparative study. We compare and analyze the energy consumption of these two platforms with PageRank, Strongly Connected Component and Single Source Shortest Path algorithms over five different realistic graphs. Our experimental results demonstrate that GraphX outperforms Giraph in terms of energy consumption. Specifically, Giraph consumes 1.71 times more energy than GraphX on average for the mentioned algorithms

    Charon: Specialized near-memory processing architecture for clearing dead objects in memory

    No full text
    © 2019 Association for Computing Machinery.programming, saving a programmer from many nasty memoryrelated bugs. However, these productivity benefits come with a cost in terms of application throughput, worst-case latency, and energy consumption. Since the first introduction of GC by the Lisp programming language in the 1950s, a myriad of hardware and software techniques have been proposed to reduce this cost. While the idea of accelerating GC in hardware is appealing, its impact has been very limited due to narrow coverage, lack of flexibility, intrusive system changes, and significant hardware cost. Even with specialized hardware GC performance is eventually limited by memory bandwidth bottleneck. Fortunately, emerging 3D stacked DRAM technologies shed new light on this decades-old problem by enabling efficient near-memory processing with ample memory bandwidth. Thus, we propose Charon1, the first 3D stacked memory-based GC accelerator. Through a detailed performance analysis of HotSpot JVM, we derive a set of key algorithmic primitives based on their GC time coverage and implementation complexity in hardware. Then we devise a specialized processing unit to substantially improve their memory-level parallelism and throughput with a low hardware cost. Our evaluation of Charon with the full-production HotSpot JVM running two big data analytics frameworks, Spark and GraphChi, demonstrates a 3.29 geomean speedup and 60.7% energy savings for GC over the baseline 8-core out-of-order processor.N

    Kubernetes cluster optimization using hybrid shared-state scheduling framework

    No full text
    This paper presents a novel approach for scheduling the workloads in a Kubernetes cluster, which are sometimes unequally distributed across the environment or deal with fluctuations in terms of resources utilization. Our proposal looks at a hybrid shared-state scheduling framework model that delegates most of the tasks to the distributed scheduling agents and has a scheduling correction function that mainly processes the unscheduled and unprioritized tasks. The scheduling decisions are made based on the entire cluster state which is synchronized and periodically updated by a master-state agent. By preserving the Kubernetes objects and concepts, we analyzed the proposed scheduler behavior under different scenarios, for instance we tested the failover/recovery behavior in a deployed Kubernetes cluster. Moreover, our findings show that in situations like collocation interference or priority preemption, other centralized scheduling frameworks integrated with the Kubernetes system might not perform accordingly due to high latency derived from the calculation of load spreading. In a wider context of the existing scheduling frameworks for container clusters, the distributed models lack of visibility at an upper-level scheduler might generate conflicting job placements. Therefore, our proposed scheduler encompasses the functionality of both centralized and distributed frameworks. By employing a synchronized cluster state, we ensure an optimal scheduling mechanism for the resources utilization.Green Open Access added to TU Delft Institutional Repository ‘You share, we take care!’ – Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.Network Architectures and Service
    corecore